Supplementary Material Rethinking Reprojection: Closing the Loop for Pose-aware Shape Reconstruction from a Single Image
نویسندگان
چکیده
We denote each fully-connected layer fc(d) by its output dimension d, and volumetric convolution layer by conv3D(k, c, s) representing kernel size of k, strides of s across three spatial axes, and c channels. 2D convolutional layer is represented as conv2D(k, c, s). and the volumetric transpose convolution layer by deconv3D(k, c, s). Encoder and Generator for Aligned Shapes The variational aligned shape encoder takes as input an 30×30×30×1 tensor, and consists of 3 convolution layers: conv3D(4, 32, 2), conv3D(4, 64, 2), conv3D(4, 64, 2); two fully-connected layers fc(200) and fc(200), regressing from the last convolution feature to the 200-dimensional mean and variance vectors for style embedding, following [2]. The decoder takes in the 200-dimensional style vector, and consists of one fully-connected layer fc(8192) to connect the input vector to an 4×4×4×128 convolutional feature; and 3 transpose convolution layers deconv3D(4, 64, 2), deconv3D(4, 32, 2), deconv3D(4, 1, 2) output the reconstructed shape with size 30×30×30×1. All convolution and transpose convolution layer are batch batch normalized except the first convolution and last transpose convolution layer. LeakyReLU [3][1] is the rectifier for all layers except the output layer which uses tanh. This architecture is also used for 3D VAE in Section 4.2. Image to Style/Pose Regressors The two regressors have identical architecture of convolution layers: conv2D(11, 64, 4), conv2D(5, 128, 2), conv2D(5, 256, 2), conv2D(5, 512, 2), conv2D(3, 200, 2). For the style regressor, an fc(200) connects the last convolution layer to the style parameters. For the pose regressor, fc(5) is used instead. All but the first convolution layers are batch normalized, and rectified with LeakyReLU.
منابع مشابه
Pose and Shape Estimation with Discriminatively Learned Parts
We introduce a new approach for estimating the 3D pose and the 3D shape of an object from a single image. Given a training set of view exemplars, we learn and select appearance-based discriminative parts which are mapped onto the 3D model from the training set through a facility location optimization. The training set of 3D models is summarized into a sparse set of shapes from which we can gene...
متن کاملUnite the People: Closing the loop between 3D and 2D Human Representations Supplementary Material
We have obtained human segmentation labels to integrate shape information into the SMPLify 3D fitting procedure and for the evaluation of methods introduced in the main paper. The labels consist of foreground segmentation for multiple human pose datasets and six body part segmentation for the LSP dataset. Whereas we discuss their use in the context of the UP dataset in the main paper, we discus...
متن کاملClosing the Loop in Appearance-Guided Structure-from-Motion for Omnidirectional Cameras
In this paper, we present a method that allows us to recover a 400 meter trajectory purely from monocular omnidirectional images very accurately. The method uses a novel combination of appearance-guided structure from motion and loop closing. The appearance-guided monocular structure-from-motion scheme is used for initial motion estimation. Appearance information is used to correct the rotation...
متن کاملCoarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose Supplementary Material
In this supplementary, we provide material that could not be included in the main manuscript due to space constraints. First, Section 1 provides additional details for the exact representation of the volumetric space and the way we can obtain metric pose estimates from the voxelized estimates. Section 2 presents full results on Human3.6M using the reconstruction error for evaluation. Section 3 ...
متن کاملRobust Energy Minimization for BRDF-Invariant Shape from Light Fields: Supplementary Material
The supplementary material is divided into three parts. In Section 1, we present a detailed comparison on the effectiveness of various terms of our energy function using synthetic data. In Section 2, we show more mesh reconstructions with real data using a single light-field image. In Section 3, we present detailed derivations of our optimization framework for depth reconstruction and the BRDF-...
متن کامل